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Abstract 

Population-based evolutionary algorithms have been widely applied to solving discrete optimization 
problems. Population scalability describes the relationship between the performance of a population- 
based EA and its population size. A question is whether the running time of a population-based EA can 
become shorter when its population size increases. This is called the problem of super linear population 
speedup under the hitting time. Although intuitively the performance of an EA may improve if its 
population size increases, currently there exist few theoretical answers and all are based on case studies. 
This paper aims at providing a general study without considering the details of genetic operators and 
fitness functions. A novel approach is introduced to analyse population scalability using the spectral 
radius of the fundamental matrix. The performance of an EA is measured by the asymptotic hitting time 
and then several general results have been proven: (1) Increasing the population size may shorten the 
asymptotic hitting time; (2) No super linear speedup is available when elitist EAs are used to maximize 
any regular monotonic fitness function; (3) Potential super linear speedup may happen when elitist EAs 
are used to maximize deceptive fitness functions; (4) "Bridgeable state" and "diversity preservation" are 
two necessary conditions for a super linear speedup; (5) The "road through bridge" condition is sufficient 
and also necessary for a super linear speedup. 

Keywords: Evolutionary Algorithms, Population Scalability, Markov Chains, Fundamental Matrix, 
Spectral Radius, Hitting Time, Elitist Selection 



1 Introduction 

Population-based evolutionary algorithms have been widely applied to solving discrete optimization problems. 
A wide number of approaches is available to design efficient population-based EAs and using a population 
delivers many benefits \T\. An intuition in this field is that the performance of an EA may improve if its 
population size increases. Nonetheless, sometimes an intuition can be deceptive. Consider the following 
popular intuition as an example: the larger is the population size used in an EA, the fewer generations an 
EA takes to find an optimal solution. A counterexample, given in Section [SI demonstrates that in some cases 
the number of generations it takes to reach an optimal solution may actually increase as the population size 
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increases at the worst case. A similar phenomenon is also observed in the average case analysis. This raises 
an important question: how to understand and explain intuitive beliefs in practice? A case-based answer, like 
"yes " or "no" for a particular EA and a particular problem instance, is rather unsatisfactory. Therefore, a 
general answer is highly desirable and yet the search for such an answer inevitably leads to a deep theoretical 
analysis. 

Population scalability describes the relationship between the performance of a population-based EA and 
its population size. Here the EA should be regarded as a family of EAs using the same genetic operators but 
different population sizes. Consider a benchmark EA and another EA in the family. Suppose both EAs use 
the same genetic operators to maximize the same fitness function. Then population scalability is measured 
intuitively in the ratio: 

Running time of a benchmark EA 
Running time of another EA 

In this paper, the benchmark EA is a (1 + 1) EA and another EA is a EA where /i > 1 is the population 

size (an integer) . The running time of an EA is usually defined as the expected number of function evaluations 
for finding an optimal solution. Notice in a (/i -|- /i) EA, the number of function evaluations is fixed to /i 
at each generation, therefore population scalability can be measured alternatively by population speedup [5] 
based on the expected number of generations for finding an optimal solution (called the hitting time): 



Hittmg time of a (1 + 1)-EA 

Speedup = -- — — ; tttt 

Hittmg time of a (u + ul-EA 



ing time of a (/i + /i)-EA 



The (/^ + ^) EA has a shorter running time than the (1 + 1) EA does if only if the population speedup 
> 11. This is called super linear population speedup. No super linear speedup, no need to use a population. 
Therefore, the question of when a linear population speedup takes place is fundamental in the analysis of 
population-based EAs. 

This paper introduces a novel approach, based on the fundamental matrix of absorbing Markov chains, 
to study population scalability of EAs. The idea is simple: many EAs can be modelled by absorbing Markov 
chains 3 , and then the hitting time of an EA can be investigated through the fundamental matrix. The 
spectral radius of the fundamental matrix is used to evaluate the performance of EAs. The spectral radius is 
called the asymptotic hitting time. Under this new measure, the question of super linear population speedup 
is 

Asymptotic Hitting Time of a (1 + 1)-EA ^ 
Asymptotic Hitting Time of a (^ + ^)-EA '^'^ ' 

Thereafter this paper focuses on studying the fundamental matrices associated with EAs and their spectral 
radius. Using the asymptotic hitting time, several general conditions have been established for super linear 
population speedup. 

The paper is organized as follows: A review of previus related work is given in Section [2l Then the 
notions of elitist EAs, asymptotic hitting time and population scalability are formally introduced Section [3] 
. Section 2] lists several preliminary lemmas related to the asymptotic hitting time. In section [S] we analyze 
population scalability of elitist EAs. Section [5] is devoted to a discussion and Section [7] concludes the paper. 



2 Related Work 

In evolutionary computation, the study of population scalability can be traced to early 1990s. Goldberg et 
al. presented a population sizing equation to show how a large population size helps an EA to distinguish 
between good and bad building blocks on some test problems [4] . Miihlenbein and Schlierkamp-Voosen studied 
the critical (minimal) population size that can guarantee the convergence to the optimum [5]. Arabas et al. 
proposed an adaptive scheme for controlling the population size, and the effectiveness of the proposed scheme 
was validated by an empirical study [B]. Eiben et al. reviewed various techniques of parameter controlling 
for EAs, where the adjustment of population size was considered as an important research issue [7]. Harik 
et al. linked the population size to the quality of solution by the analogy between one-dimensional random 
walks and EAs [8]. While the approximate population sizing models proposed by the above investigations 
may shed some light on deducting the "promising" population size, the effectiveness of the models has been 
validated only by various case studies based upon specific optimization problems. 
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There do exist a few rigorous results about population scalability. As one of the earliest theoretical 
analysis, He and Yao investigated how the running time of EAs varies as the population size increases [2]. 
Then the link between scalability and parallelism was discussed in [9;. Jansen et al. studied the population 
scalability of the (1 + fi) EA on three pseudo-Boolean functions, Leading-Ones, One-Max and Suf-Samp 
|10j . Lassig and Sudholt presented runtime analysis of a (1 -I- ^) EA with an adaptive offspring size /i on 
several pseudo-Boolean functions [TT]. Jagerskiipper and Witt analyzed how the running time of a (// + 1) 
EA on the Sphere function scales up with respect to the problem size n [T^]. Jansen and Wegener showed 
that the running time of the {fi + 1) EA with a crossover operator on the Real Royal Road function is 
polynomial on average, while that of an EA with mutation and selection only is exponentially large with an 
overwhelming probability |13) . Witt proved theoretically that the running time of the 1) EA on a specific 
pseudo-Boolean function is polynomial with an overwhelming probability, when is large enough |14j . Storch 
presented a rigorous runtime analysis of the choice of the population size with respect to the (p + 1) EA on 
several pseudo-Boolean functions [T3]. Oliveto et al. presented runtime analysis of both (1 -I- n) and (/x + 1) 
EA on some instances of Vertex Covering Problems [16]. Friedrich et al. analyzed the running time oi (fi + l) 
EAs with diversity- preserving mechanisms on the Two-Max problem jl7j . For the (/i + /i) EA, an upper 
bound on the first hitting time has been obtained on two well-known unimodal problems, Leading-Ones and 
One-Max [TS]. Nonetheless, all the available results are mainly restricted to several simple algorithms for 
simple problems. In other words, the up-to-date knowledge is limited to case studies only [TO] . 

In contrast with the previous investigations, the current paper aims at providing a general study of 
population scalability of EAs. It is expected that the analysis may cover EAs as widely as possible. In 
theory, it is rather challenging to study and compare variants of EAs together [20] since they exploit different 
genetic operators, parameter controls, adaptation mechanisms etcetera. Nonetheless, a wide family of elitist 
EAs can be modeled by the transition probability matrices representing mutation and selection thereby 
providing a convenient mathematical framework to conduct a general study. 

In the general study of EAs, a major difficulty is the calculation of the hitting time. Without knowing the 
hitting time, it is difficult to estimate population scalability or to pin down the conditions under which the 
super linear population scalability occurs. In order to overcome the difficulty, this paper introduces a novel 
approach based on the fundamental matrix. The fundamental matrix approach can be traced back to Fogel's 
early work on asymptotic convergence properties of genetic algorithms and evolutionary programming |21j . 
It has been applied in analysing elitist EAs by He and Yao [22] . 

3 Formalization of Evolutionary Algorithms and Population Scal- 
ability 

3.1 Formalization of EAs 

Consider the discrete optimization problem of maximizing a fitness function f{x): 

max f{x), 

where 5" is a finite set. For example, 5* is the set of all tours in a travelling salesman problem, or the set of 
all vertex covers in a vertex covering problem. 

Let the fitness function take L + 1 values /o > /i > ■ • • > /l, which are called fitness levels. 

For the convenience of theoretical analysis, suppose that all constraints in the above problem have been 
removed through a penalty function method. Under this circumstance, all solutions in S are thought to be 
feasible. Practical or theoretical analysis of constraint handling in evolutionary computation can be found in 
references such as [2S| [23] . 

The procedure of a (/i -f /i) EA is described in Algorithm [TJ The notation used in the algorithm is 
introduced as below. Table [T] lists most of notation used in the paper. 

• S'^^-' := S is called the individual state space, the Cartesian product 5*'^-' := Y[i=i is called a population 
state space if /i > 2. 

When necessary, superscripts (1) and (/i) will be added in order to distinguish between a (1-1-1) EA 
and a. {fj, + fj.) EA. 
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Table 1: Notation Table 

the individual state space, = S 

the Cartesian product nr=i called a population state space if /x > 2 
distinguish between a (1 + 1) EA and a (/x + /x) EA 

individual states 
population states 

the optimal set, consisting of all individual states which are an optimal solution 
the non-optimal set, = 5^^^ \ sj^^^ 

the optimal set, consisting of all population states which contain at least one optimal solution 

the non-optimal set, = S*'^' \ sj^^l 

the set consisting of all population states whose super individual is x 

the set of population states whose super individual has a higher fitness than f{x) 

the transition probability from X to Y 

the probability of X staying in the set Ssamcix) 

the probability of X entering the set S'^fgh(a;) 

the largest probability of remaining in any non-optimal state, = maxP(a;, x) where a; is a non-optimal st 

generation counter 

the f-generation population, = {4>t,i, • • • > 4't,iJ.) 
the children of $t after mutation, = (^4+1/2,1, • • • , <t>t+i/2,ij.) 
the probability of the f-gcncration population at state X 
the probability distribution of the t-generation population 
the hitting time starting from state X 

the vector represents hitting times for all non-optimal states 
the transition matrix for transitions within the non-optimal set 



,{x) 



the transition matrix P for transitions within the set Ssamey 
the fundamental matrix = (I — T)~^ 
an eigenvalue of a matrix 
the spectral radius of matrix T 

the spectral radius of fundamental matrix N, called asymptotic hitting time 
the population speedup between a (1 + 1) EA and a (/it + /it) EA 



• x,y,z G S^^^ denote states in an individual state space, and X,Y,Z£ 5^^^ states in a population state 
space. A population X consists of fj, individuals {xi, - ■ ■ , a;^) in the order of their fitness: f{xi) > ■ ■ ■ > 
f{xp). The first individual is called the super individual of the population. The rest of individuals are 

called non-super individuals. 

• t\s the generation counter. $t = (</>(,!, • • • , 4't,ti) represents the t-generation population. 
The corresponding definitions of mutation and elitist selection operators appear below. 

• A mutation operator is transitions from 5^^^ to 5^^^ where transtion probability 

Pr (0t+i/2 = y\ (l)t=x), 
represents the probability of going from state x to state y. 

• A selection operator is transitions from S'^^^ x S**-^-* to 5^^', where the transition probability 

Pr{^t+i = Z\^t = X, $4+1/2 = Y), 
represents the probability of going from states X and Y to state Z. 

A requirement is that all individuals in Z must come from X ov Y . In other words, if Z contains an 
individual not in X or Y, then the probability of going from states X and Y to state Z is 0. 
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Algorithm 1 A + ^) EA 
1: input: fitness function; 
2: generation counter i ^ 1; 
3: initialize $i; 

4: while (no optimal solution is found) do 

5: $4+1/2 ^ each individual in $t generates a child by mutation; 
6: evaluate the fitness of each individual in 
7: <~ elitist selection from ('ft, $4+1/2); 

8: t^t + l; 
9: end while 

10: output: the maximal value of the fitness function. 



It is allowed that individuals in Z are selected from Y only. Such kind of selection operator is traditionally 
called {fj,, n). For simplicity, {/i + ji) EAs in the current paper represents both traditional (11 + /x) and (/it, ji) 
EAs. 

Many mutation operators can be abstracted into transition from one state to another state. For example, 
mutation could be from a tour to another tour in the travelling salesman problem, or a node cover to another 
node cover in the node covering problem. The above definition of selection operators actually cover both 
elitist an non-elitist selection. Both mutation and selection operators are assumed to be time- independent. 
This paper will not analyse EAs with time-adoptive operators. 

The stopping criterion is that the algorithm will terminate once an optimal solution is found. This 
criterion is taken only for the convenience of analysing the hitting time, which is commonly done without 
loss of generality |25| . If an EA cannot find the optimal solution, it runs for ever. This paper will not discuss 
this kind of EAs. 

Remark In practice, prior information about the optimal solution usually is unavailable. Hence even if an 
optimal solution is found, it could be lost again. This is still under the current framework, since in theory 
the EA is assumed to halt once it finds an optimal solution. 

The mathematical framework introduced above incorporates a wide class of EAs as it doesn't assume any 
implementation detail. Moreover, the above formalization does cover adaptive operators. 

• Adaptive Mutation: Mutation transition probability 

f (0t+i/2 = 2/ I '/'t = a;) 

adapts to state x. 

• Adaptive Selection: Selection transition probability 

P($t+i = Z I $t - A, $t+i/2=F) 

adapts to states X and Y . 

Apparently the number of approaches to design such adaptive mutation and selection operators is count- 
less. Several examples appear below. 

Example 1. An example 0/ adaptive mutation. 

Consider an EA using bitwise mutation for maximizing the One-Max Junction: 

n 

/(^)-E^- (4) 

i=l 

where x — {hi ■ ■ ■ 6„) a binary string of length n. 

If the string x includes k zero-valued bits inside, then flip each bit independently with probability p{x) = 
k/n. 
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In this example, if a population X = (0001, 0111), then the probability of flipping each bit is 3/4 for the 
individual 0001, and 1/4 for 0111. 



Example 2. An example of adaptive mixed strategy mutation \26f . 

Given 100 different mutation operators, at each generation, each individual will select a single mutation 
operator according to a probability distribution on the collection of mutation operators and then apply it. 

In this example, different mutation operators may be applied in different areas on a fitness landscape. 
The probability distribution of choosing mutation operators adapts to the state x, yet it is time-independent 
(i.e. it does not depend on the number of iterations having been made so far). 

Example 3. An example of adaptive selection. 

Consider an EA using bitwise mutation for maximizing the Fully-Deceptive function: 

f(^] _ f n + l, if YJl^^h = 0, , ^ 

^^^"lA:, ifY.Uh = k,k^l,--- ,n, 

where x = (6i • • • 6„) be a binary string with the length n. 

Given two populations $i = X and $t+i/2 = Y , if individuals in X UY are too similar, then apply a 
selection operator with a lower selection pressure; otherwise with a higher pressure. 

In this example, given two populations X — (0111, 0111) and Y = (0111, 0001), the probability of selecting 
0001 into the next parent population could be set to 1. But for X = (0111, 0100) and Y = (0101, 0001), then 
the probability of selecting 0001 into the next parent population could be set to 1/3. 

Example 4. An example of adaptive mixed strategy selection. 

Given 100 different selection operators, at each generation, a single selection operator is chosen according 
to a probability distribution and then applied. 

In this example, different selection operators may be chosen at different generations, but when encounter- 
ing two identical populations at different generations, the probability of choosing a specified selection operator 
remains the same. Thereby, the probability distribution adapts to X and Y , but it is time-independent. 

All mutation and selection operators in the examples above can be reformulated in terms of probability 
transitions which are time- independent. 

3.2 Markov Chains associated with EAs 

In the whole paper, suppose that EAs under consideration can reach an optimal state staring from every 
state. According to the stopping criterion, the EA halts once an optimal solution is found, so if $t = AT is 
an optimal state, then let $t+i = $t+2 = • • • = X for ever. Thus the sequence {$t,^ ~ 0, 1, ■ • • } can be 
modelled by an absorbing Markov c/iazrQ |27| . Let P be its transition probability matrix, having entries 

P{X, Y) Pr($t+i = y I $t - X). 
The probability of going from A to a set S^J^} is denoted by 

P{X,si!i) Pr(<i>,+i e Si^l I = X). 

The hitting time represents the first time when the EA reaches the optimal set. It is formally defined as 
follows. 

Definition 1. The hitting time to'^^'(X) of an EA is defined to be the expected number of generations the 
EA takes to encounter an optimal solution for the first time r when starting from a specified initial population 
X, i.e., 



{X):^Y.^Pr{T = t). (6) 



^An absorbing Markov chain is a Markov chain in which every state can reach an absorbing state. An absorbing state is a 
state that cannot be left once entered. See wikipedia.org under the entry "Absorbing Markov chain". In EAs, an optimal state 
is absorbing and a non-optimal state is transient. 
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Remark In the general study, it is impossible to calculate the worst case or average case hitting time, since 
neither EAs nor the fitness functions are specified. Thereby, it becomes more difficult to study population 
scalability. This motivates us to introduce the notion of asymptotic hitting time in the current paper. 

Arrange all population states in the order of their super individual's fitness from high to low (where the 
populations having the same super individual are arranged together but in any order) and write them in a 
vector form: 

(Xl, • • • 

then their hitting time in a vector form0: 

^(m) = (m(^)(Xi),TO('^)(X2),---)'^- 
The transition matrix of an absorbing Markov chain p(^) can be written in the following canonical form[f|, 

qil-l-) C-Cm) 



■•^opt 

c(m) / T 



non , 




p(M) ^ -^opt (7) 



where I is a unit matrix for transitions within the optimal set, and a zero matrix. The matrix T*^''' denotes 
transitions within non-optimal states The part * represents transitions from non-optimal states to optimal 
states but plays no role in the analysis. 

Clearly \/X € si'^^,m^''^X) = 0, so that there is no need to discuss the hitting time when starting from 
states in the optimal set. Thus, the analysis is focused on the non-optimal set. Throughout the paper, m^'^' 
denotes the hitting time vector restricted to the non-optimal states. 

3.3 Asymptotic Hitting Time and its Meaning 

Perhaps the most elegant part of the classical theory of absorbing Markov chains revolves around the notion 
of a fundamental matrixjj (Definition 3.2.2 in [27]). 

Definition 2. The matrix 

N(m) (8) 
is called the fundamental matrix of the absorbing Markov chain represented by the transition matrix in 

The expected number of generations before being absorbed when starting in transient state X is is the 
sum of the entries in the X-th row row of N*^^) (Theorem 3.3.5 in [27]): 

Theorem 1. The hitting time vector 

(9) 

where 1 = (1, 1, . . . , I)"'". 

The spectral radiu^ has played a central role in analysing the convergence of EAs. For example, an EA 
is convergent if and only if the spectral radius of its transition matrix T^'^^ is less than 1. In this paper, the 
spectral radius of the fundamental matrix p(N'-^'') will be taken to measure the performance of an EA. 

Definition 3. The asymptotic hitting time of an EA is defined by 

p(N('')), 

which is the spectral radius of the fundamental matrix N^'^' . 



represent a column vector and v'" the row column with the transpose operation. 

^In absorbing Markov chains, the canonical form of the transition matrix is represented by transient states and absorbing 
states. See wikipedia.org under the entry "Absorbing Markov chain" 

■^Given a fundamental matrix N = [Nij], its i, j-entry is the expected number of visits to a transient state j starting from a 
transient state i (before being absorbed). See wikipedia.org under the entry "Absorbing Markov chain" 

^The spectral radius of a square matrix A is the supremum among the absolute values of the eigenvalues, see wikipedia.org 
under the entry "Spectral Radius". 
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The spectral radius of the the fundamental matrix can be calculated from that of the transision matrix. 
Lemma 1. Let T*'^-' be the transition matrix T^^' of an EA, and N'-^^' the the fundamental matrix, then 

p(N('^)) = ^— . (10) 

Proof. For simplicity, remove the superscript (/i) in the analysis. From the definition 

N = (I - T)-\ 

it follows that A is an eigenvalue of T if and only if (1 — A) is an eigenvalue of N. 

Since T is non-negative, then according to Perron Frobenius Theorems (p.670 in ;28]), p(T) is an 
eigenvalues of T such that 

P(T)>| A|, 

where A is any eigenvalue of T. 

(1 — p(T))~^ is an eigenvalue of N. Since it satisfies 

1 1 1 

> > 



i-p(T) - 1- 1 A I - 1 1- A r 

so (1 — /9(T))^^ is the largest eigenvalue of N, i.e., the spectral radius. □ 

What does the asymptotic hitting time means? Three interpretations are given in the following. The first 
accurate meaning is explained through a non-elitist EA with the following property: 

• each non-optimal state can be reached from a non-optimal state after finite generations. 
In this case the transition matrix T*-^^ is primitives- 
Lemma 2. Suppose transition matrix T'-''^ associated with an EA is primitive. Then 1 — p(T'-^'') is the 
asymptotic conditional probability of the EA entering the optimal set per generation, i.e.. 



1_p(T(^)) ^ lim 



The asymptotic hitting time ^(N^''-') is the reciprocal of the above asymptotic conditional probability. 

Proof. For simplicity, remove the superscript in the proof. Since $t takes random values, so consider the 
probability distribution of $t in the analysis. Denote the probability of $t in a non-optimal state X by 

PtiX) ■.= Pr{<Pt=X). 

and denote the probability distribution of $t in the non-optimal set by the vector 

Pt^ipt{Xi),pt{X2),---f. 

Then the Markov chain is represented by 

pf+i=pfT = pJ(T)*. (11) 

Suppose that initially population $o is in non-optimal state X, that is 

P{^o ^ X) ^ PoiX) ^ 1. 



^Perron Frobenius Theorem: for any non-negative matrix A, there exists a Perron root which is an eigenvalue of A and 
equals to the spectral radius. The eigenvalue corresponding to the Perron root is non-negative. See wikipedia.org under the 
entry "Perron Frobenius theorem" . 

square non-negative matrix A is said to be primitive if there exists a positive integer k such that A'° > 0, that is, every 
entry is positive after k iterations. See wikipedia.org under the entry "Perron Frobenius theorem" . 
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Since the probability of $t in the non-optimal set is 

then the probability of $t entering the optimal set at the i-generation is equal to 

=Pr($t_i G ) - Pr(<i>, G S^^^ \ ^t-i e S^/^) 
=pril-PoT*-4. 

Thus the conditional probability of $( entering the optimal set when $t_i is in the non-optimal set equals 

Pr($t G Si';l I G Stl) 

Pr($t_i G 
Pf-Il-Pfl 



Pf-il 

=1 - 



Pfl 



=1 - 



P^ il 
P^(T)*1 



P^ (T)*-il 
Denote 

qt = (T)*l, t = 0,l,2,..- 
Recalling po{X) — 1, and write the probability distribution of $0 as 

Po = (0,-- - ,0,1,0,-- - ,0f, 

then 

Po^t = qt{X). 

Thus the conditional probability of $t entering the optimal set under the condition not in the 

optimal set is rewritten by 



Pr($, G I $,,1 G si""!) _ ^ q,{X) 



According to Theorem 2 in [29] 



lim = Pin 



then 



1 — lim 



Prj^t G Sjf, I 1>t-i G Si'i,) 
Pr{^t-i G S^Zl) 



= l-/r^^ = l-p(T), 

This proves the conclusion: 1 — p(T) is the asymptotic conditional probability of the EA entering the 
optimal set per generation. Then the asymptotic hitting time is the reciprocal of the asymptotic conditional 
probability. □ 
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The second meaning of the asymptotic hitting time is based on the asymptotic convergence rate of EAs, 
defined as follows (the same as that in iterative methods (p. 73 in [50]). 

Definition 4. ) The asymptotic convergence rate of an EA is 

-lnp(T(^)), (12) 
where ^(T^''-') is the spectral radius o/T^^^ and In(-) is the natural logarithm. 

It represents an exponential decay ratio for a sharp upper bound of the probability of the EA towards 
the non-optimal set per generation. The asymptotic convergence rate is the simplest practical measure of 
rapidity of convergence of an iteration method in common use by far (p. 73 in [30] ) . EAs discussed in this 
paper belong to iterative methods. 

The asymptotic hitting time approximatively equals the reciprocal of the asymptotic convergence rate 
[26], i.e., 

. , . . . 1 1.39 

asymptotic hittmg time k, ; . 

asymptotic convergence rate 

The following theorem gives a more accurate statement. 
Proposition 1. //p(T(^)) > 0.5 then 

' < Pi^''') < w!llu > (13) 



and 



Proof. Since 



then let w = p{T^^^'>) and define 
If 0.5 < w < 1, then 



-ln(T(A')) ' -ln(T(A'))' 



lim (-lnp(T(^)))p(N(^)) = 1. (14) 

p(T(f))-i.l 



(-lnp(T(^)))p(N(^))^^i^, 
Inw 



1 



1 < g{w) < 2 In 2, 



which proves the first conclusion. 

Let w ^ 1, then according to L'Hopital's rule (p. 107 in 31 ) 

g{w) 1. 

which proves the second conclusion. □ 
Figure [T] illustrates the function giw). 

Remark The condition p(T(^)) > 0.5 is very mild. It is equivalent to p(N*^^-') > 2, i.e., the asymptotic 
hitting time of an EA is not less than than 2. This will be true for most of optimization problems. 

Finally the asymptotic hitting time is a mean hitting time when the initial population <I>o subjects to a 
specific probability distribution. 

Lemma 3. Let the vector Pq be the left eigenvector corresponding to p(N('^'), and be normalized to 1, then 
/9(N('')) is the mean hitting time when the initial population <I>o takes on the probability distribution pp . 
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Figure 1: The asymptotic hitting time approximatively equals the reciprocal of the asymptotic convergence 
rate. 



Proof. For simplicity, remove the superscript in the proof. 

Since p(N) is the spectral radius of matrix N, then there exists some left eigenvector such that 

pJp(N) = p^N. 

Since matrix N is non-negative, then according to Perron-Frobenius theorem I (see p.670 in [28]), Po is 
a non-negative vector. 
Normalize pp to 1 

Then 

So 



Pol 



p;^p(N)l = p^Nl = p^m. 



XeSnan 

thus /3(N) is the mean hitting time when the initial population $o subjects to the probability distribution 
Po. □ 

3.4 Population Speedup 

Population speedup under the asymptotic hitting time is defined as follows. 

Definition 5. Given a (14-1) elitist EA and a (fi + fj.) elitist EAs (fi>l) using the same mutation operator 
for maximizing the same fitness function, population speedup of the (fi + ii) elitist EA over the (1-1-1) EA 
under the asymptotic hitting time is defined as the ratio 

/9(N(i)) l-p(T(^)) 
speedupi^) := ^ i_^(t(i)) - (^^^ 

Remark The concept of population speedup is inspired by the concept of speedup widely used in parallel 
algorithms [32 . However population speedup doesn't depend on the number of parallel computing processors. 

Remark There is an essential difference between population speedup and No Free Lmich Theorems [33]. 
Population speedup compares the performance of two EAs using the same genetic operators for maximizing 
the same fitness function, while No Free Lunch Theorems compare the average performance of two EAs for 
maximizing all fitness functions. 



*In the standard description of Perron-Frobenius theorem, the assertion is given to its right eigenvector, but it also holds for 
its left eigenvector. 
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Definition 6. Given a (1 + 1) elitist EA and a {fi + fi) elitist EAs using the same mutation operator for 
maximizing the same fitness function, the (fj, + /i) elitist EA is called to have super linear speedup if 



speedup{p) > ^. 



(16) 



In the above definition, a basic requirement is both EAs must adopt the same mutation operators. This 
ensures the comparison is meaningful. However it is impossible to require the selection operator the same. 
Since even under the same name, e.g., propositional selection, their selection operator with in EAs using 
different population sizes are never identical. Usually the following conditional probabilities are not equal for 



Remark There is a link between super linear population speedup and super linear speedup in parallel EAs. 
If each individual is assigned to one processor, then elitist EA becomes parallel EAs. Under this circumstance, 
super linear speedup implies super linear speedup if ignoring the communication cost. An interesting question 
in parallel EAs is whether super linear speedup exists and how this phenomenon happens. Although super 
linear speedup sometimes is controversial |34| . previous studies have confirmed that parallel EAs can come 
out with super linear speedup on parallel machines |35i 136) . 



The rest of the paper focuses on investigating elitist EAs which adopts global mutation and elitist selection 
operators. The corresponding definitions appear below. 

• A mutation operator is called global if every state can reach another state through mutation. 



• A selection operator is called elitist if the parent super individual is replaced by the child super individual 
only in case when the child super individual is fitter. If let x be the super individual of $*(= X), y the 
super individual of 5't+i/2(= Y), then the super individual of 4>t_|_i(= Z), denoted by z, is selected as 
follows: 



There is no restriction on selecting non-super individuals and any selection strategy can be applied. 

Remark An alternative elitist operator is to replace the parent super individual by a child with a better 
or equal fitness [12]. This paper will not discuss this variant. 

Using global mutation will guarantee that each state is reachable. Elitist selection is used to maintain the 
best solution found over time. EAs, using global mutation and elitist selection, have two good properties. 

This first property is called the "global mutation property" . It compares the probability of going from 
a population state to the higher fitness level and the probability of going from one individual state to the 
higher level. 

Denote S\^^^{x) to be the set consisting of all populations whose super individual's fitness is larger than 
/(a;), where x G S'^oL Denote P{X, s[l!\^{x)) the probability of going from X to the set P(X, s'^lj^x)). 

Lemma 4. Let X = (xi, • • ■ , x^) G 5non whose super individual xi = x, then the "global mutation property" 
in elitist EAs is for all i = 1, ■ ■ ■ , fi, and any fi > 2, 



Pr{(j)t+i ^ z\(j)t^ X, 04+1/2 = y) 
^Pr($t+i =Y\^t=X, $i+i/2 = Y). 



4 Elitist EAs and its Analysis 



4.1 Elitist EAs 



Pri(t)t+i/2 ^y\(j)t^x)>0. 




(17) 




(18) 
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Proof. Since the event of going from state X to the higher fitness level is equal to the event of at least one 
of states Xi to the higher fitness level. 

Through mutation, the probability of going from state Xi to the higher fitness level is 

Pr{(f't+i/2 I 0t = x^) = P{x^,sl\l^^{x)), 
and then probability of the event of none of states Xi to the higher fitness level is 

Pr{^t+, i Si^l^ I -ft^X) = flil - Pix.,sl^l^{x)), 

so its probability of states X to the higher fitness level happening is 

Pr($i+i G I = X) = 1 - f[il - P{x,, 

i=l 

Hence for all z = 1, • • • , /x 

W5gjX))>P(x„5^;>,(x)), 
F(X,5g,(X))<f]P(x„5«,(x)). 

i=l 

and the above inequalities are strict for /i > 2. □ 

Using elitist selection, the super-individual in a population will either enter the higher fitness level or keep 
unchanged, and never becomes worse. This is called the elitist selection property. It is formalized as follows. 

Denote S'iamc(a;) to be the set consisting of all populations whose super individual is x, where 
Denote P{X, Ss^mc{x)) the probability of going from state X to the set S'iamc(a;). 

Lemma 5. Let X in Snon whose .super individual is x, the elitist selection property in elitist EAs is 

P{X, Si>:Ux)) + P(X, Sl^l^ix)) = 1. (19) 

A great number of mutation operators and selection operators satisfy them. 

Example 5. An example of global mutation. 
Consider pseudo-Boolean optimization 137V, 

• Bitwise Mutation: Given a binary string x, flip each of its hits independently with a positive probability. 
Example 6. An example of elitist selection. 

• Elitist Proportional Selection: the super individual is replaced if the child super individual is better 
than it; and non-super individuals are selected from two populations X and Y by so-called proportional 
selection fSW . 

4.2 Transition Matrices of Elitist EAs 

In the current subsection we investigate the transition matrix associated with an elitist EA. Suppose that a 
(1 + 1) elitist EA and a (/i + /i) elitist EAs use the identical mutation operator to maximize the same fitness 
function. 

First consider the (1 + 1) elitist EA. Arrange all states in S'non in the order of their fitness from high 
to low (where the individuals at the same fitness level may be arranged in any order), and write them in a 
vector form 

{xi,X2,X3, ■■■)^, 

where f{xi) > /(xa) > /(xg) • ■ • . 
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Elitist selection insures that the super individual never becomes worse, and, therefore, the super individual 
never revisits any state at a lower fitness level. Thus the transition probability satisfies: for any x, y with 

f{y) < fix) 

P{x,y) = 0. 

It follows then that the transition matrix T^^-* is a lower triangular matrix, which can be written in the 
form: 

/P{xi,xi) ••• 0\ 

P{x2,Xi) P{X2,X2) ••• Ol (20) 

Since the transition matrix is a lower triangular matrix, then it follows 



Lemma 6. Given the Markov chain associated with a (1 + 1) elitist EA, T*^^-* is transition matrix respectively. 

:= argmax{P(a;,x);x e (21) 



Let Xp be the state such that 



then the spectral radius 

p(T(i)) =P(a;„Xp). (22) 

Proof. It comes from the fact: if A is a lower triangular matrix, then its eigenvalues are the diagonal entrie^ 
(See Exercise 7.1.3 in ^28)). 

Let A = T(^\ then the diagonal entries of T^^^ are eigenvalues and non-negative, so its spectral radius is 
the largest eigenvalue 

p(T(i)) = P(xp,Xp), 

it proves the conclusion. □ 

P{x, x) is the probability of remaining in state x. The above lemma shows in the (1+1) EA, the spectral 
radius equals the maximal values of probabilities of remaining in non-optimal states. 

Example 7. Consider the (1 + 1) EA using bitwise mutation with the mutation rate 1/n and elitist selection 
for maximizing the One-Max function. 

The probability of the EA remaining in a non-optimal state is maximal in a state such as a; = (10 • ■ • 0) 
or (0 • • • 01), and equals 

Example 8. Consider the (1 + 1) EA using bitwise mutation with the mutation rate 1/n and elitist selection 
for maximizing the Fully-Deceptive function. 

The probability of the EA remaining in a non-optimal state is maximal in a state such asx=(ll---l), 
and equals 

p(tW) = i- 

Next consider a (/x + /i) EA where fj, > 2. 

Arrange all populations in Snoli in the order of the fitness of their super individual from high to low 
(where populations with the same super- individual are arranged together) , and write them in a vector form: 
(Ai, X2, ■ ■ ■ )"^. Let xi,X2, ■ ■ ■ be their super individuals, then their fitness satisfies 

fixi)>fix2) > ■ ■ ■. 



^Given a lower triangular matrix A, the diagonal entries of A give the multiset of eigenvalues of A. See wikipedia.org under 
the entry "Triangular Matrix" 
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Elitist selection means that the super individual never becomes worse, and so the super individual in a 
population never revisits any state at a lower fitness level. Thus the probability of going from state X (with 
the super individual x) to state Y not in the set siarac{x) or S^^^^ii^) is 

p(x,y) = o, 

Hence the matrix T'^^'' is a block lower triangular matrix, which can be written in the form: 

(xi) S'samc(a:2) ••• 



( T('^l,, •••\ 



'S'iame(2;2) 



(23) 



where is a zero sub- matrix and T^^l is a transition matrix for transitions from states in the set S'iainc(a;) to 
states in the set S'iamcd/)- In particular, T^'^l is the transition matrix for transitions within the set S'iainc(a;). 
The following lemma is an extension of Lemma [B] for the case when fi > 2. 

Lemma 7. Let T^'^l be a matrix given in /123\) . which represents transitions within the set Ssamcix), then 

p(t('^)) = max p(T(/;)). (24) 

Proof. The proof is based on the following fact (See Exercise 7.1.4 in [2H]): if A is a block lower triangular 
matrix such that 

"B 



then A is an eigenvalue of A if and only if A is an eigenvalue of B or D. Thus 

p(A) =max{p(B),p(D)}. 
Let A = T('') (see the matrix then 

p(T('')) = max p{Ti^l). 

it finishes the proof. □ 
4.3 Lower and Upper Bounds on Spectral Radii 

This subsection discusses lower and upper bounds on the spectral radii. The first lemma gives the bounds 
on the spectral radius of the fundamental matrix and shows that the asymptotic hitting time is between the 
best case and the worst case hitting times. 

Lemma 8. The asymptotic hitting times is between the best and worst case hitting times, i.e., 

min m(^)(X) < p(N) < max m^^'\X). 

Proof. From Theorem [TJ it holds 

^(p)(x)= Ar(^)(x,r), 

The rest of proof is a direct application of the following fact: given any n x n non-negative A = [aij], its 
spectral radius satisfies the inequalities F°l (See Exercise 8.2.7, page 668 of [28])) 

n n 

min a.ij < p(A) < max^~^ ajj. 
i=i ' j=i 

Let A = N^''^ then it finishes the proof. □ 

^"The result in Exercise 8.2.7 (page 668 of 28 )) is for positive matrices, but it is true for non-negative matrices. The proof is 
the same with replacing the CoUatz-Wielandt formula for the positive matrices by the CoUatz-Wielandt formula for non-negative 
matrices. 
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The second lemma gives the bounds on the spectral radius of the transition matrix Tx^l. 



Lemma 9. For any x G >S'non 



p(T(/;))> min P{X,Si±{x)), (26) 



p(t2;))< max P{X,StU^)). (27) 

Proof. The proof is still a direct application of the fact: given any n x n non-negative A — [a^ j], its spectral 
radius satisfies the inequalities: 

n n 

min^~^ aj j < p{A) < max^^a, j. 

Let A = Tx'l, then it finishes the proof. □ 
4.4 Time-Based Fitness Landscape 

The set S discussed in the paper is only required to be finite. No topological structure, such as neighbourhood 
or distance, has been assigned to it. In order to study population scalability, the time-based fitness landscapes 
is introduced in the paper. 

Definition 7. Given a fitness function f{x) and a (1 + 1) elitist EA, its associated time-based fitness 
landscape is defined by the set of pairs 

{(mW(a;),/(x));a;e5W}, 
where m^^^ is the hitting time of the (1 + 1) elitist EA starting from the state x and f{x) its fitness. 

If the mutation operator in the (1 -I- 1) elitist EA or the fitness function changes, then the related time- 
based fitness landscape will be different. The hitting time plays the role of "distance" on the fitness landscape. 
The distance is "non-predictive" or "posterior" , because its description is based on the hitting time of EAs. 
It is impossible to construct a "predictive" or "prior" distance efficiently for describing the hardness of fitness 
landscapes [39] . 

Time-based fitness landscapes are classified into two categories under the background maximizing a fitness 
function: 

• Monotonia landscape: \fx G 5non,2/ G 5*11011, 

mW(x)<m(i)y)^/(x)>/(y). 

In this case, f{x) is called a monotonia function with respect to the (1-1-1) EA. The hitting time 
m^^-*(a;), when x is at the lowest fitness level, is maximal. 

• Multi-modal landscape: 3x e 5*11011, y G 5non, 

m(i)(a:) <m(i)y)^/(x) </(y). 

Example 9. Consider the (1+1) EA using bitwise mutation with the mutation rate l/n and elitist selection 
for solving the One-Max function, then the related time-based fitness landscape is monotonia. 

In the One-Max function, the larger fitness, the shorter hitting time. The optimal state is (1 • • • 1) whose 
fitness is n, and the state (0 • • • 0) has the lowest fitness and the hitting time to (1 • • • 1) is the largest. 
An important type of multi-modal fitness landscapes is the deceptive fitness landscape, defined as follows: 



16 



• Deceptive landscape: Va; G S'non,y S S'non, 

m^^\x) < m^^\y) ^ fix) < f{y). 

In this case, /(x) is called a deceptive function with respect to the (1 + 1) EA. The hitting time rn(^^(x), 
when X is at the 2nd highest fitness level, is maximal. 

Example 10. Consider the (1+1) EA using bitwise mutation with the mutation rate 1/n and elitist selection 
for maximizing the Fully-Deceptive function, then the related time-based fitness landscape is deceptive. 

In this example, except the optimal state, the larger fitness, the longer hitting time. The optimal state 
is (0 • • • 0) whose fitness is n + 1, and the state (1 • • • 1) has the 2nd largest fitness n but the hitting time to 
(0 • • ■ 0) is the largest. 

5 Analysis of Population speedup under the Asymptotic Hitting 
Time 

The combination of the fundamental matrix and time-based fitness landscape provides a novel approach to 
the analysis of population scalability. Several general results are drawn in this section using the asymptotic 
hitting time. 

5.1 Using a Population-based EA can Reduce the Asymptotic Hitting Time 

An intuition in using a population-based EA is that the number of generations for the EA to find the 
optimal state decreases if the population size increases. The following proposition proves this true under the 
asymptotic hitting time. Although the result seems trivial, a proof is still needed. 

Proposition 2. Given a (1 -I- 1) elitist EA and a (fj, -\- fi) elitist EA (where /i > 2j using the same mutation 
operator for maximizing the same fitness function, then using a population-based EA shortens the asymptotic 
hitting time, i.e., 

speedup{p) > 1. 

Proof. For the (1-1-1) elitist EA, let Xp S S'non be the individual such that 

Xp :=argmax{P(a;,a;);a; G Sfj^U' 

then from Lemma |6l 

p(TW)=P(a;p,Xp). 
For the -\- ^) elitist EA, from Lemma [71 

p(T(^)) = max p{t2:1). 
Since the set S'non is finite, then there exists an x G Snon such that 



p(T(^))^p(T(^)). (28) 



For the above x, consider transition matrix T^'^l. 

According to Lemma [SI the spectral radius p{Ti'^l) is upper bounded by 

p(T(/;))< max P{X,StU^)). 



Thus there exists em X — (xi, X2, ■ ■ ■ ,Xp) in the set S'iamo(a;) where xi = x, and the above inequality 
holds for this specific X, 

p{T<i:i)<p{x,stu^)). 
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From the "elitist selection" property 

^high 



then 

p(Ti/:i)<l-P(X,5;,^^,(x)). (29) 



Then from the "global mutation property" (IT51) . V/i > 2 

>F(x 

=1 - P(a;,2;) 



Recalling P{xp,Xp) is the maximal, then 

=i-p(tW). 



combining it with Inequality (1291) 

p(t(/;))<p(tW). 

Recalling (^5)): 

p(T(''))=p(T(/f)), 

then 

p(T(^)) < p(T(i)), 
1 -p(T(A')) ■ 

This means speedup{fi) > 1. □ 
5.2 No Diversity, No Super Linear Population speedup 

The meaning of "no diversity" refers to only the super individual is kept after elitist selection. The parent 
super individual generates fj, children, and it is replaced unless a better child is produced. In this case, the 
+ fi) elitist EA degenerates into a {1 + fi) EA. Using "no diversity" selection will not produce super linear 
population speedup. This is confirmed from the following proposition. 

Proposition 3. Given a (1 + 1) elitist EA and a (n + p) elitist EAs (where fi >2) using the same mutation 
operator for maximizing the same fitness function. If the {fi + jS) uses "no diversity" elitist selection, then 
no super linear population speedup happens. 

Proof. For the (1 + 1) elitist EA, let Xp € Snon be the individual such that 

Xp :==argmax{P(x,x);a: G 

According to Lemma [H 

piT(^^)^P{xp,Xp)). 

For the (/i + /i) elitist EA, consider transition matrix Let Xp = {xp,--- ,Xp). 

Since the EA uses "no diversity" elitist selection, and only one super individual is kept after selection, 
then 

1. the probability of going from state Xp to other state Y in iS'iamc(2^p) \ Xp is 0; 
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2. the probability of going from state X in sia}ne{xp) \ Xp to state Y in siltmcixp) \ Xp is 0, since after 
"no diversity" selection, X will becomes Xp if not entering the higher fitness level. 

Thus T^p^.ajp is reduced to a lower triangular matrix, such that 

Xp *S'same(^p) \ Xp 

Xp f PiXp,Xp) 

si^L{xp)\Xp\ * 

where the part * plays no role in the analysss. 
The spectral radius of li'^plxp is 

p(T(';),j = p(x„Xp). 

From the "elitist selection" property (fT9|) . 

P{Xp,Xp) + P{Xp,slf^^ixp)) = l. 

and then 



p{Ti%j=i-p{Xp,si:;l^{xp)). 



From the "global mutation" property [TSl V// > 2 

PiXp,st:Uxp)) < ^iP{xp,s\i^Uxp)), 



and then 



p{T:i%^)=i-p{Xp,s^^l{xp)) 



-'highV 



>l-^iP{xp,Sl':>^{xp)) 

= 1 - - P{Xp,Xp)) 

^1-M1-P(T«)). 



According to Lemma [71 



then 



p(T(^)) = max p(T 



p(t('^))>i-m(i-p(t«)) 
1 -p(t(^)) 

l-p(T(i)) 

And this means speedup(fj,) < fj,. □ 
Example 11. Super linear population speed up never happens to + EAs due to no diversity preservation. 

5.3 Super Linear Population speedup Never Happens to Regular Monotonic 
Fitness Functions 

Using a population does not bring any benefit for elitist EAs to maximize a monotonic fitness function. The 
following theorem shows that super linear speedup never happens to regular monotonic fitness functions. 

The meaning of a "regular" monotonic fitness function is that: given two states x and y, if x has a better 
fitness than y, then staring from x, the EA has a larger probability to enter the higher fitness level than 
staring from y. The formal definition given as below: 
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Definition 8. A monotonic fitness functions is called regular with respect to a {I + 1) EA if\fx,y such that 
/W(a;)>/(i)(y), 

>Pr(0t+i e I = y). (30) 

Proposition 4. Given a (1 + 1) elitist EA and a (/i + /i) elitist EAs (where fi >2) using the same mutation 
operator for maximizing the same fitness function, if the fitness function is a regular monotonic function to 
the (1 + 1) elitist EA, then then no super linear population speedup happens. 

Proof. For tire (1 + 1) elitist EA, let Xp G S'non be the individual such that 

Xp :=argniax{P(a;,a;);x G S^^J^-^}, 

then from Lemma |6] 

p{T^'^)^P{xp,Xp). 

For the + ^) elitist EA, consider transition matrix T'^x^.xp- 

From Lemma IHl the spectral radius of matrix T'it^.xp is lower bounded by 

p(Ti':)j> min 



Then there exists some X € Ssalidxp) such that 

p(T(';),J>P(X,5il,(x))). 
From the "elitist selection" property p^ . 

P{X, Sit^ljx)) + P{X, Sl^lix)) = 1 

then 



p{Tirx)>i-Pix,si::>,sx)). (31) 

Write X ~ {xi,X2, ■ ■ • ,Xp) where xi — Xp. 
From the "global mutation" property \ffi > 2 

^rihC^p)) < E ^(^- '^high(^p))- (32) 

1=1 

Since the fitness function is regular monotonic with respect to the (1 + 1) EA, then the higher fitness an 
individual is, the larger probability towards the higher fitness lever. It follows for all individual xi, - ■ ■ 
in X 

Pi^^^Sl!:l{Xp)) < Pix,,S^^l{Xp)) 

— P{xp, s'^1\,{xp). 

Combing it with Inequality l\Z2\ . 

Pix,sl^l^ixp))) < ^iPixp,sl!:l^ixp)) 

= Ai(l - Pixp,Xp)) 

= Mi-p(tW))- 

Combining the above inequality with Inequality (1311) . it follows 

p(T('^) )>1-M(1-P(T«)). 
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and from Lemma [7] 

= max 

then it follows 

i-p(t(''))>Mi-p(tW)) 

l-p(T(i)) 

This means speedup{fi) < fi. □ 

Example 12. Consider a (1 + 1) EA using bitwise mutation with a mutation rate 1/n and elitist selection 
for solving the One-Max function. 

In this example, the higher fitness an individual state is, the larger probability for it towards the higher 
fitness lever, so the One-Max function is a regular monotonia function to the (1 + 1) EA. 

5.4 Bridgeable Point: Necessity of Super Linear Population speedup 

As seen in the previous section, super linear speedup never happens to regular monotonia fitness functions. 
This conclusion can be generalized into a necessary condition for super linear speedup: "bridgeable state" . 

Definition 9. A state y is called a bridgeable state of the state x if y satisfies two conditions: 

1- f{y) < fix); 

2. Through mutation, the probability of going from state y to the higher fitness level is larger than that 
from state x to the higher fitness level: 

Pr{^t+i/2eSl^^ {x)\<l,,^x) 
The name of "bridgeable state" comes from an intuitive meaning: 

• the fitness of the state y is not better than that of x; after applying the mutation operator on these 
two states separately, the probability from the state y entering the higher fitness level is a larger than 
that from state x. So y could be taken as a bridge towards the higher fitness level. 

The following confirms that "bridgeable state" is a necessary condition of super linear speedup. 

Proposition 5. Given a (1 + 1) elitist EA and a {fi + fi) elitist EAs (where p >2) using the same mutation 
operator for maximizing the same fitness function, if super linear speedup happens, then for any Xp such that 

Xp := argmax{P(a;,x);a; G S^J^}, 

there exists a bridgeable state of Xp . 

Proof. For the (1 + 1) elitist EA, from Lemma [SI 

p(T(i)) = P(a;p,Xp)). 

Assume that there exists no bridgeable state of Xp, then for all x with f{x) < f{xp), 

Pri<j>t+i/2 e S^^l^ix) \ <l>t ^ x) 
<Pr{ct>t+i/2eS'i^Ux)\ci,t^Xp). 



For the {p + /i) elitist EA, consider transition matrix TxJ. 
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From Lemma [HI the spectral radius of matrix T^i^^j^xp is lower bounded by 



p{Ti^'X)> min PiX,Si::^^{x,)). (34) 

X&slf.ixp) 



Since the set -S'^fgii(^p) ^ finite set, so there exists some X e si^Lcixp) such that 

'high 



p(Ti';) )>P(X,5(^^\^(xp)). (35) 



From the "elitist selection" property (|T9|) . 

-■high 



PiX, 5Ll,(x)) + P(X, 5(^i(a;)) = 1 



then 

p{Ti%J>l-P{X,stiy,{x)). (36) 

Write X = (xi, • • • , Xp) with xi = Xp. 

From the "global mutation" property, V/i > 2 

^(^.^hrgh(^p)) < E^(^-'5hSh(^p))- (37) 

1=1 

According to the assumption, there is no bridgeable state of Xp, then \fi — 1, ■ ■ ■ fj, 

Pr{(t)t+i/2 e 5^1gh(a;) I (kt = Xi) 



then 



<Pr{(j)t+i/2 e -ShighC^;) I <?^* = 2;p), 



^(^».^hlgh(2^p)) <^(2;p.^hSh(2;p))' 



Combing the above inequality with Inequality p7p . it follows 

^(^, ^Sh(^p)) < E ^(^- <h(^p)) 

i=l 

<pp{xp,si':i{xp)) 

= Mi-p(T«))- 

Combining the above inequality with Inequality p6p. it follows 

p(Ti':i.j>i-Mi-p(T('^))- 

From Lemma [7] 



p(T(^)) = max p(T('')) 



then it follows 



1-P(T(''))>M1-P(T(^^)) 

1 -p(T(^)) 

l-p(T(i)) 

This means speedup(fi) < /i. 

This contradicts the condition of the super linear speedup happening: speedup{^) > /i. Hence the 
assumption doesn't hold. In other words, a bridgeable state of Xp must exist. □ 

Example 13. Consider a {fi + fi) EA (where fi>2) using bitwise mutation with the mutation rate 1/n and 
elitist proportional selection for solving the Fully-Deceptive function. 

In the Fully-Deceptive function, the state (1 • • • 1) has "bridgeable states" like (01 • • • 1), (101 •••!), • • • 
and (1---10). 
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5.5 Diversity Preservation: Necessity of Super Linear Population speedup 

The following theorem shows "diversity preservation" is another necessary condition of super linear speedup. 
The concept of "diversity preservation" is defined as follows: 

Definition 10. In a {fi + ji) elitist EA (where n > 2), let X = {x, ■ ■ ■ , x), where all individuals are the same 
X G 5'non- "Diversity preservation" refers to that if the set S'iamc(a;) \X is not empty, the probability of going 
from state X to the set Ssamcix) \ X must be greater than 0. 

Proposition 6. Given a (1 + 1) elitist EA and a + ii) elitist EAs (where fi >2) using the same mutation 
operator for maximizing the same fitness function, let Xp be the individual such that 



and Xp = {xp, ■ • • , Xp). 

If super linear speedup happens, then the "diversity preservation" condition holds: the probability of going 
from Xp to the set S'iamc(a;p) \Xp must be greater than 0. 

Proof For the (1 + 1) elitist EA, let Xp G S'non be the individual such that 



Xp = argmax{P(a;,a:); a; S S^J^}, 



:=argmax{P(x,x);x G S'ion}- 





P{Xp,Y)^0. 



(38) 



So transition matrix Txp,Xp reduces to a block lower triangular matrix such that 





(39) 



then 



P{Xp,Xp) + P{Xp,Sl'^Uxp))^l, 



Combing it with Inequality (|39|. it follows 



p{Ti%^)>l^P{Xp,Sl':l^ixp)) 
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From the "global mutation" property V/x > 2 
and so 

= 1 - ^(1 - P{Xp,Xp)) 

=1-M1-P(T«)). 

According to Lemma [71 

p(T(^)) = max piT^^l) 

then 

p(t(''))>i-m(i-p(tW)), 
1 -p(t(^)) 

l-p(T(i)) 

which means speedup{fi) < p. This contradicts the condition that the super linear speedup happens. Hence 
the assumption doesn't hold. In other words, the "diversity preservation" condition holds. □ 

Example 14. An EA using elitist proportional selection satisfies the "diversity preservation" condition . 

5.6 Potential Super Linear speedup for Deceptive Fitness Functions 

Using a population is helpful for maximizing deceptive fitness functions. Super linear speedup could happen 
under certain conditions. 

1. There exists some bridgeable state(s) for those "bad" states. The transition from the bridgeable state(s) 
to the optimal sets is large; and the transition from those "bad" state to the bridgeable state(s) is also 
large. 

2. A population-based EA adopts a mechanism of fitness diversity preservation. Thus it is possible for 
the EA to go through some bridgeable state(s). 

The following theorem confirms the above intuition. 

Proposition 7. Given a (1 + 1) elitist EA and a {fi + fi) elitist EAs (where p >2) using the same mutation 
operator for maximizing the same fitness function, suppose that the fitness function is deceptive to the (1 + 1) 
EA. If the following two conditions holds: 

1. Fitness Diversity Preservation: given — X and <I'f+i/2 = Y , if there exists one or more individuals in 
X orY whose fitness is less than that of the super-individual of X , then at least one of these individuals 
must he selected into the next population. 

2. Bridgeable Point: let x be any state at the 2nd highest fitness, then all states at a lower fitness level are 
bridgeable states and through mutation, the probability of going from a bridgeable state y to the optimal 
set is larger than that from x, which satisfies 

P{cl>t+i/2 e ^^pl I 0t = y) 
>i^P{<l>t+i/2 e sill = x), (40) 

3. Pass through bridgeable states: Through mutation, the probability of going from the above x to the set 
of bridgeable states is large, 

P{4't+i/2 G the set of bridgeable states \ (pt — x) 
>pP{^t+i/2^si]^,\^t^x). (41) 
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Then super linear speedup happens: speeduplp.) > fi. 

Proof. For the (1 + 1) elitist EA, let x be a state at the 2nd highest fitness level, then 

1 



1 - P{x,x)' 



Recalling in a deceptive function, the hitting time m'-^^x), starting from the 2nd highest fitness level, is the 
largest. From Lemma [51 the spectral radius of matrix N^^^ is upper bounded by 

pfN^^h < max m(i)(z) = m^^Ux) = -. (42) 

zeskll l-Pix,x) 

From Lemma [S] 

P(N^^^) ^^TTT = max ^. (43) 

Combing (l42|l and pB]) together, it gives 

P(x,x) ^ piT'^^h = max P(z,z). 

For (/i + fi) EA, let X be any state in the non-optimal set. Now consider the probability of going from 
state X to the optimal set in two generations. 

Case 1 Xp — {xp, ■ ■ ■ ,Xp) where Xp at the 2nd highest fitness level. 

From Conditions (j40|) and ((4T|) . the probability from X to the optimal set in two generations is greater 
than 

P{^t+2 G S^pl I containing a bridgeable state) 
xP(^t+i containing a bridgeable state | $f = Xp) 

>ix\P{c^t+i e sill I cj^t - xp))2. 

Case 2 X = (xi, • • • , Xp) containing at least one bridgeable state. 

From Conditions (|4T|) . the probability of going from X the optimal set in two generations is greater than 

Pi^t+2 e Si^, I $^+1 e si^l)Pi^t+i e 5^^^ | - X) 
>/i(P(0t+i e ^1^1 I = X,)). 

Combining the above two cases together, it follows for all states X in the non-optimal set, 

M 

opt 



P{^t+2 e si^J, l^t^x) 



>!-(!- m(^('/'*+i e ^Ipi I = ^p) 
Equivalently for all states X in the non-optimal set, 

P($,+2 e S^^l l^t^X) 
<{i-fi{P{cl,t+iesil\\<j,t^xp)y 

Write the above inequality in oo-normF^. 

II {T(^yf \\^<{l-^,{l-P{xp,xp)f 



-^^For a square matrix A = [A^.j], ||A||jj^ = max X]?=i l<^ij l> which is simply the maximum absolute row sum of the matrix. 

l<i<m ■' 

See wikipedia.org under the entry "Matrix Norm" 
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Since the spectral radius of a matrix is no more than its maximum nor n@ (p.619 in [H]), it yields 

(p(T(^)))'< (ll (t(^))2 II, 



\oo] , 



then 



= l-/i(l-p(TW)) 



and it follows 

1 - p(T('') 



1 - p(T('') 

This means super linear speedup happens. □ 

Example 15. Consider a (1 + 1) EA and a (fi + fi) EAs (where ^ > 2) using bitwise mutation with the 
mutation rate 1/n and elitist selection for solving the Fully-Deceptive function. The + EA uses selection 
with fitness diversity preservation. 

Recalling in the Fully-Deceptive problem, the unique optimal solution is (0 ■ • • 0) which has the highest 
fitness = 77.-1-1 and (1 ■ • • 1) has the 2nd highest fitness = n. In the (1 -I- 1) EA, all states y, with fitness 
f{x) < n, are bridgeable points of state (1 • • • 1). 

Through mutation, the probability of going from state x = (1 • • • 1) to (0 • • • 0) is 



-1/2 e ^Ipl I 0* = x) = , 

and through mutation, the probability of going from state y with fitness f{y) < n to (0 • • • 0) 

and through mutation, the probability of going from state x — (1 • • ■ 1) to the set of bridgeable states is 

P{4>t+i/2 £ the set of bridgeable states \ (j)t — x) 

Then Conditions PO]) and (PT|) holds for any /i such that 1 < fi < n. So super linear speedup happens for 
any fi such that I < fi < n. 



5.7 "Road through Bridge": a Sufficient Condition 

To achieve super linear population speedup, it is important for an EA to go through some "bridgeable state" . 
This leads to a sufficient condition for super linear population speedup, called the "road through bridge" 
condition. 

The intuitive meaning of the "road through bridge" condition is described as follows: there are two types 
of roads from a state towards the higher fitness level. One is the road jumping from its current fitness level 
directly towards the higher fitness level; another type is the road through some bridgeable state at a lower 
or the same fitness level before reaching the higher fitness level. 

The "road" is an intuitive description of transition between two states X and Y . It is defined as follows 

m- 

"'^Any induced norm satisfies the inequality ||A|| > p(A). Sec wikipedia.org under the entry "Matrix Norm" 
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Definition 11. Given two states X,Y & S'^'^K if there exists k states = X ^ Xi ^ ■ ■ ■ ^ Xk = Y such 
that 

P{Xo.Xi)---P{Xk-i,Xk+i)>0 

then {Xq, • • • , Xk} is called a road from X to Y , denoted by road{X, Y, k) and k is called the length of the 
road. 

Denote the probability of passing tlie road as 

P{road{X, Y, k)) = P{Xo, Xi) ■ ■ ■ P{Xk-i, Xk+i). 

Given a state X wliose super individual is x and a state Y in the set S^l^^-^^lx), roads from X to Y can be 
classified into two categories: 

• road{X,Y,k) through bridge: at least one of intermediate states Xi, ■ ■ ■ ,Xk-i is a bridgeable state of 

X. 

• road{X, Y, k) over gap: none of intermediate states Xi, - ■ ■ , Xk-i is a bridgeable state of x. 

Denote P{road{X, sl'^^^^{x), k) through bridge) to be the probability of going from X to the set s'^^-^{x) 
through "roads through bridge" with the road length being k. 

Denote P{road{X, 'S'^^gij(x), k) over gap) to be the probability of going from X to the set S^^^^^^x) through 
"roads over gap" with the road length being k. 

A sufficient condition for super linear speedup is given as follows. 

Theorem 2. Given a (1 + 1) elitist EA and a + fi) elitist EA (where fJ- > 2) using the same mutation 
operator for maximizing the same fitness function, let Xp be the individual such that 

Xp := argmax{P(a;,a;);a; e S^l^^}. 

For the {ii + fi) EA, if the following road through bridge condition holds: 

• 3fc > 0, Va; e sLll and \/X e skL{x), 

P{road{X, S^'^^^^ix) , k) over gap) 
+P{road{X, sl^^^Yii^)^^) through bridge) 

>l-{l-fi{l-P{xp,Xp))\ (44) 

then supper linear speedup happens. 

Proof. For the (1 + 1) elitist EA, from Lemma [6l 

p{T^'^)^P{xp,Xp). 
For the + ^) EA, from Lemma [71 it follows 

p(T(^)) = max p{Ti^l). 

Since the set 5'non is finite, then there is some x G S'non such that 

p(T(''))=p(T(/;)). (45) 



Consider transition matrix T^^l. Let X be any state in the set S'iame(a;)- 
From Inequality (|44l) it follows 

Pr($fe e 5gjx) I $0 = X) 



>1- 1-M1-P(T^'0 
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From the "elitist selection" property 

Pr{<^u e Si±,{x) I $0 = X) 
+Pr($fe e ^(gjx) I $0 = X) = 1 

it yields 

< fl-M(l-p(TW)'' 



Since the above inequality holds for all states X in the set S'iamG(a;), thus 

max Pr{<^>k G S^Li^) \ = X) 

<(i-m(i-p(tW))'. 

Write the inequality in the cx)-norm, 

II iTii)" ||oo< fl-M(l-p(T«)'' 



Since the spectral radius of a matrix is no more than its maximum norm (p. 619 in |28)), it yields 

i/fe 



then 



Recalhng (gS]), then 



p(T(^)) < (ll [Ti^ir lloo) , 

p(t(/;))<i-m(i-p(t«)), 

l-p(T(i)) 

1 -p(T(^)] 



l-p(TW) 

and this means the super linear speedup happens. □ 
5.8 "Road through Bridge": a Necessary Condition 

Furthermore the "road through bridge" condition is a necessary condition of super linear speedup. The 
following theorem confirms this assertion. 

Theorem 3. Given a (1 + 1) elitist EA and a {fj, + fi) elitist EA (where fi > 2) using the same mutation 
operator for maximizing the same fitness function, let Xp be the individual such that 

Xp := argmax{P(x,x);x G S^J^}. 

For the (p, + /i) EA, if the road through bridge condition does not hold, i.e., 

• 3x G sill and 3X G si'^Lix), Vfc > 0, 

P{road{X, s[l!\^{x),k) over gap) 
+P{road{X, s[l!(\^{x),k) through bridge) 

<l-{l- ^{1-P{xp,xp))\ (46) 
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Then supper linear speedup doesn't happen. 

Proof. For the (1 + 1) elitist EA, it follows from Lemma [51 

For the (fi + fi) EA, consider transition matrix Tx,x- 
From Inequality (|46l) . it follows 

<i-(i-Mi-p(tW)))\ 

From the "elitist selection" property ([T5)) 

^K<ffe e I $0 = A) 

+Pr(<&fc e I $0 - A) - 1 



it yields 



Pri^k e 5i,le(^) I •I'o = A) 

>fi-Mi-p(T(^^))'' 



And then 

max Pr($fc e | $0 = F) 

>(l-/i(l-p(T«)))'. 
Write the inequality in the 00-norm, 

II (T(/;))^-|U>(i-/i(i-p(TW)))'. 

Let k +00, then from Gelfand's formuleP^ (p. 176 in 41 ), it follows 

piTi^l) = hm (11 (Ti^ir lie 

Then 

p(t(/:))>i-Mi-p(tW)), 

1 - pjTii) ^ 

l-p(T(i)) 



i/fc 



From Lemma [7] 



and it follows 



p(T(^)) = max p(Tl^ 



1 _p(T('^)) 
l-p(T(i)) 

This means super linear speedup does not happen. □ 



^■^Gelfand's formula: for any induced matrix norm || A ||, its spectral radius p(A) = limi_>oo || A' H^'^', see wikipedia.org 
under the entry "Spectral Radius". 
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Propositions [3] to [7] are special cases of the the "road through bridge" condition. 

• In Proposition [31 "no diversity" means that the "road through bridge" is not accessible. 

• Proposition m shows there doesn't exist bridgeable state for regular monotonic fitness functions. 

• Proposition [S] demonstrates the existence of "bridgeable state(s)" is necessary to super linear speedup. 

• Proposition [5] confirms that "diversity preservation" is also necessary to super linear speedup. 

• Proposition [7] provides an example of "roads through bridge" existing in some deceptive functions. 

6 Discussions 

6.1 Other Types of EAs 

If a mutation operator is not global, then it can be modified to be global easily through mixed strategy 
mutation 26j. Each individual chooses the non-global mutation operator with probability 0.999, and a 
global mutation operator with probability 0.001. Then the new mixed strategy mutation is global. 

If a selection operator is not elitist, then it can be made elitist by adding a virtual super individual. 
It plays the role of archiving the best found solution so far, but not involved in the mutation or selection 
procedure. In this scenario, the EA becomes a (1 + 1) + (/i + /i) EA: the (1 + 1) EA for updating the best 
solution, and the {fi + fi) the original one. 

Recombination operators such as crossover are widely used in EAs. In this case, it is still possible to 
apply the fundamental matrix approach into the analysis of population speedup. Since a (1 + 1) elitist EA 
doesn't include a recombination operator, therefore it is not an appropriate candidate as the benchmark EA. 
Instead, a (2 + 2) elitist EA with recombination will play the role of the benchmark EA, thus population 
speedup is redefined by 

The above EAs still can be modelled by absorbing Markov chains and it is possible to establish similar 
"road through bridge" condition. The following theorem could be taken as a starting point for studying their 
population speedup. 

Theorem 4. Given a {i> + v) EA and a (p + ii) EA using the same genetic operator(s) for maximizing the 
same fitness Junction. For the + /i) EA, population speedup 

p(N(a')) V 

if and only if the following condition holds: 



• 3A: > 0, e Sk^n, 



P{road{X,si^lk)) 
>l-fl--(l-p(T(''))))'. (48) 



Proof. Following the same proof of Theorems [2] and |3l it comes to the conclusion. □ 

It is not easy to make a general study in population scalability of EAs using dynamic genetic operators 
adapting to generation t. A potential solution is to define population speedup for every fixed generation t, 
i.e., 

^ (49) 

In this scenario, population speedup varies over generation t. This implies a link to the dynamical setting of 
population sizes. 
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6.2 Counterexamples 

A question arises whether or not similar conclusions can be drawn if the asymptotic hitting time is replaced 
by the worst case or average case hitting time. The answer is negative. For example, consider the intuition: 
"increasing the population size may reduce the number of generations for an EA to find the optimal state" . 
The following counterexamples show this is wrong at both the worst case and average case but it has been 
proven true under asymptotic hitting time (see Proposition [2]) . 

Example 16. The fitness function is given in Table\^ 



Table 2: Fitness function. 



state 


Xq 


Xi 


X2 


X3 


X4 


fitness 


5 


4 


3 


2 


1 



Consider the (/i + fi) EA (={l + fJ.) EA): the super individual is reproduced by /i copies and each copy 
generates a child by mutation; and the super individual is replaced only when a child is better than it. The 
mutation transition probabilities P{x,y),x,y = xo, ■ ■■ , 2^4 are given in Table\3^ where e > is a sufficiently 
small constant (the size of e will be discussed later), e plays a role of making the mutation operator global. 



Table 3: Mutation transition probability matrix. 



state 


xo 


Xl 


X2 


xs, 


Xi 


Xq 


1 


-4e 


e 


e 


e 


e 


Xl 


1 


-4e 


e 


e 


e 


e 


X2 




e 


1 -4e 


e 


e 


e 




1 


-4e 


e 


e 


e 


e 


X4, 




e 


e 


0.5 


0.5 -3e 


e 



First set e = 0. The worst case hitting time for the (1+1) EA is the time when the EA starts from x^: 
According to the probability transition matrix above, it takes 2 time steps (with probability 1) to reach the 
fittest individual xq if Xi mutates to X2^ and this event happens with probability 0.5, while it takes 1 time 
step if Xi mutates to x^ (also with probability 0.5). Thus, by definition of the expectation, 

m^^\xi) = 2 • 0.5 + 3 - 0.5 = 2.5. 

The worst case hitting time for the (2+2) EA is the time when the EA starts from [xi^Xi). Because of 
the "greedy selection" rule the only possible populations in the second generation are (2:3, x^) and (x2, X2) 
and, since the fitness of the individual x^ is lower than that of Xi^ the only way that after the first time 
step the new population is [xz, x^) is if both copies of Xi mutate into the individual 0:3 which happens with 
probability 0.5 • 0.5 = 0.25. Consequently, the probability that the population in the second generation is 
{xi, Xl) is 1 — 0.25 = 0.75. Thus, according to the mutation probability transition matrix it takes 2 time 
steps to reach the optimum with probability 0.25 while it takes 3 time steps to do so with probability 0.75 
so that 

m^'^\xi,Xi) = 2 ■ 0.25 + 3 ■ 0.75 = 2.75 
explicitly showing that the worst case hitting time for the (1+1) EA is shorter than that of the (2+2) EA i.e. 

m''^\xi) <m''^\xi,Xi). (50) 

In other words, using a larger population size may increase the number of generations for the EA to find an 
optimal state at the worst case. Furthermore, the reasoning above generalizes to the case when /i > 2 and 
shows that V/i G N we have 

m^^'\ xi,Xi,...,Xi) = 2 • 0.5'' + 3 • (1 - 0.5''), 

/i times 
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thereby demonstrating that m^''^ {x4, X4, . . . , x^^ times) is a strictly inctreasing function of the population size 
(i.e. increasing the population size also increases the algorithm complexity starting with the worst case). 
Now observe that 

m'^^ (X4) - m'^) (x4, X4) 

is a continuous function of e so that for small enough e the inequality as well as the conclusion in 
the paragraph above still hold. Moreover, notice that the continuity argument implies that the greedy rule 
for selection can also be alleviated to allow the non-superindividuals to enter into the new population with 
sufficiently tiny probabilities so that all the same conclusions remain valid. 

The next example shows that using a larger population size may increase the average case hitting time. 

Example 17. The fitness function is given in Table^ 



Table 4: Fitness function. 



state 


Xq 


Xl 


X2 


2^3 


= 4,--- ,103) 


fitness 


5 


4 


3 


2 


1 



Consider the (/i + fi) EA (—{1 + fi) EA): the super individual is reproduced by /i copies and each copy 
generates a child by mutation; and the super individual is replaced only when a child is better than it. The 
mutation transition probabilities are given in Table O where e > is a sufficiently small constant just as in 
the previous example. 



Table 5: Mutation transition probability matrix. 



state 


Xq 


Xl 


X2 




Xi{i > 4) 


Xq 


1 


-4e 


e 


e 


e 


O.Ole 


Xl 


1 


-4e 


e 


e 


e 


O.Ole 


X2 




e 


1 -4e 


e 


e 


O.Ole 




1 


-4e 


e 


e 


e 


O.Ole 


x^ii > 4) 




e 


e 


0.5 


0.5 -3e 


O.Ole 



The only difference from the previous example is that now there are 100 "bad" individuals with the same 
longest hitting time (rather than only 1 individual, as in the previous example), compared with only 4 "good" 
points with shorter hitting times. Thus the average case hitting time is almost the same as the worst case 
hitting time. The size of e > can be chosen sufficiently small and also the greediness of selection can be 
alleviated according to the same type of continuity argument as in the previous example, of course, to show 
that the (1 + 1) EA outperforms the (2 + 2) or (3 + 3) EA. 

In a sense, the counterexamples and theoretical results in previous sections might be explained intuitively 
as follows: let's view an EA as a car, its asymptotic convergence rate as speed, its asymptotic hitting time as 
time, and its population size as engine size. Then using a larger population size will definitely increase speed 
(e.g., from 50 miles per hour to 60 miles per hour), and reduce time (from 1/50 hours per mile to 1/60 hours 
per mile). However there is no guarantee to reduce the total time needed to reach the destination. Since 
there exist different roads towards the destination, sometimes the car is more likely to choose a longer road 
if using a larger population size. 

7 Conclusions 

This article provides a general theoretical analysis of super linear population speedup of elitist EAs, that 
employ global mutation and elitist selection operators. A new approach based on the fundamental matrix 
is introduced to the analysis of the population speedup. The performance of an EA is measured by the 
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spectral radius of the fundamental matrix, called the asymptotic hitting time. It approximatively equals the 
reciprocal of the asymptotic convergence rate, i.e., 

. , . . . 1 ~ 1.39 

asymptotic hittmg time w . 

asymptotic convergence rate 

The relationship between the asymptotic convergence rate and asymptotic hitting time is similar to that 
between speed and time. 

The following results have been proven by analysing absorbing Markov chains associated with elitist EAs. 

1. Increasing the population size does shorten the asymptotic hitting time. 

2. No super linear speedup is available when elitist EAs are used to maximize any regular monotonic 
fitness function. 

3. Potential super linear speedup may happen when elitist EAs are used to maximize deceptive fitness 

functions. 

4. "Bridgeable state" and "diversity preservation" are both necessary conditions for the super linear 
speedup. 

5. The "road through bridge" condition is sufficient and also necessary for the super linear speedup. 
The "road through bridge" condition highlights two principles in designing efficient population-based EAs: 

1. Using a population is more useful for multi- modal fitness functions, but less helpful for monotonic 
fitness functions. 

2. Enhancing the transition along the "road through bridge" is crucial in designing efficient EAs. There 
are two important design issues in practice: one is to intelligently locate the bridge state(s) and another 
is to construct the "road through bridge" . 

These results are well-consistent with the intuitions commonly used in practice. The analysis of population 
scalability provides a solid theoretical foundation to these intuitive practices. Finally counterexamples show 
that similar intuitive conclusions cannot be established if the performance of EAs is measured by the worst 
case or average case hitting time. 
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